Imagine a setting where you have just had your family Christmas dinner. You all sit together around the fireplace drinking a hot chocolate, when your father suddenly turns on Despacito by Luis Fonsi. Although it is just a song like any other, the consensus will be that it is absolutely not a song that fits the setting.
In 2019 researchers already found a difference in the way people listen to music throughout the year (Park et al., 2019). The most famous example of a song that is mainly played in certain times of the year is, of course, All I Want For Christmas Is You by Mariah Carey. When taking a look at the amount that the term ‘All I Want For Christmas’ is Googled the past 5 years, it is seen that in specific parts of the year the popularity spikes thousandfold [1]. Although the reasons may be clear for Mariah Carey’s tracks, there is no straightforward explanation why people listen to certain songs more in the summer and others in the winter.
In this storyboard, there will be taken a look into what makes a specific track a ‘summer song’, and what makes a track a ‘winter song’. This will be done by comparing two official Spotify created playlists. The first playlist is called Summer ’22, and the other playlist is called The Winter Chill, which both consist of 100 songs. They can be previewed and played on the right.
These playlist have been chosen because they are both made by the official Spotify account. This means that they are generally considered to be a good representation of what users think are summer and winter songs respectively. The fact they are made by Spotify also means that there is less personal bias in both the playlists, meaning they are very useful to do general research on. For this research, the Spotify API will be used to try and discover what the specific differences are between summer and winter songs and which features are more present in one or the other category.
BBC music journalist Greg Kot wrote in 2014 that summer hits of the last decades are united by the fact that “they’re energetic and at least sound upbeat, even when they’re not.”. The expectations are that songs that would fall under the summer category are more ‘happy’, and have a higher tempo. In addition it is expected they also have a high danceability, since the summer is also the season of festivals.
Winter songs are hypothesized to be slower, and more melancholic (less happy) than summer songs. In the winter people tend to be in a worse mindset due to multiple factors (Lam et al., 2001), possibly leaving them less open to happy, cheerful music. For this reason winter music is also expected to be more acoustic and instrumental.
For some reason the Spotify embedding doesn’t work, I have tried for hours and hours, but I get a really weird error message.
Link to the Summer playlist and the Winter playlist.
The error messages I get:
Are summer songs happier?
As said before, many people would describe summer music as being happy and uplifting. Music that you could dance to or sing along with. While winter music would be said to be more calm and melancholic. To test this, the Spotify API is used, which can measure many features of a track, such as it’s loudness, instrumentalness or danceability.
The question now is, how is the ‘happiness’ of a song measured? One of the spotify measurable audio features is ‘energy’, which looks at the intensity or activity of a song. Another feature is ‘valence’, which is described as a measure of musical positiveness, where a high valence means a more happy or euphoric track.
In the figure on the left we can see for every track in the playlist where they score in terms of energy and valence. It is visible that the blue winter songs, on average, score lower on both energy and valence than the summer songs. This means that a song with a high energy and/or valence is more likely to be considered a summer song!
Note: The goal was to make this page and some of the following pages interactive, and let the user select which feature to show on this page. This worked, but took very long computationally, which is why it was decided to just place all histograms in picture format next to each other as seen on the left.
Acousticness
When picturing a song that suits the winter, usually people picture a more melancholic song, with an acoustic guitar or piano and slow vocals. A feature that best measures this, is the acousticness feature, which measures to what extent a certain track is acoustic or not.
On the left we can see the acousticness of both the summer playlist and the winter playlist compared to each other in a single histogram.
On the x-axis the acousticness scale is displayed, and on the y-axis the amount of times that that acousticness is found in the summer or winter playlist respectively.
It is noticeable that for winter songs the acousticness is very diverse and can range anywhere from 0 to 1. On average however, winter songs are more acoustic than summer songs. The summer tracks are in general non-acoustic, with over 80 of the 100 tracks having less than 0.25 on the acousticness-scale.
Keys
Besides acousticness it is possible to generate a histogram that demonstrates the keys of the tracks in the two different playlists. With on the x-axis the keys (translated to numerical from actual key values), and on the y-axis the count.
The two histograms for the summer and winter playlists are quite comparable in terms of key, with the only notable outlier being that there are no summer songs with the ‘number 3’-key, which is the D# (or Eb).
Tempo
The third histogram displays the different tempo of the tracks on both playlists. It is noticeable that once again for winter songs the histogram is more spread out than for summer songs. Most summer songs are around the 128bpm region, shown by the high red bars in the middle. Especially when looking at songs with a low tempo, summer songs get significantly less common!
Although the average of both playlists are pretty close together (126bpm for summer and 118 for winter), the histogram is still pretty insightful, since it shows the difference in the distribution of the two playlists.
On the left we have both a keygram and a tempogram for the song “Don’t Break the Heart” by Tom Grennan. This song was chosen due to it having one of the highest valences in the summer playlist, and being a song that many people would classify as an outright summer song.
The parts for the keygram are segmented by Spotify, which gave the most optimal results. The normalisation used is the Euclidean and the cosine distance function is used. The three chorusses at around 50 seconds, 120 secons and 170 seconds are fairly visible aswell.
The tempogram is non cyclic and very clearly shows that the song is constantly around the 122/123 beats per minute, which is correct!
Timbre is a vector-valued feature, and it is possible to compare each specific timbre coefficient between the two playlists. On the left is the result of such a modulation.
It can be seen that there are some significant differences in some Timbre coefficients. Coefficient 1, 2, 4 and 10 look very distinct between the two playlists. From the spotify documentation it is not completely clear what these features are however.
In order to further understand the structure of the ‘Don’t Break the Heart’ track by Tom Grennan, the two self-similarity matrices on the left were constructed.
Both of the segments are at the bar-level. The matrix on the left is based on the chroma (or pitch) and uses the Manhattan normalisation and Aitchison distance method. The choruses are again visible, but less clear than before. It also slightly shows the fade out at the end.
The right matrix is based on the timbre features and used the Euclidean normalisation and cosine distance function. Here the patterns are a lot more visible. The three choruses are clear, and the changes throughout the song are visible. Especially the switch at around 85 seconds.
The question that now remains is, is there something we, or a computer to be more precise, can learn from all the previous data?
The histograms in the beginning and the timbre coefficients showed that there is a measurable difference between tracks from the summer playlist and tracks from the winter playlist. As a result from the previous data analysation, 3 features have been chosen to train a K-Nearest Neighbour classifying algorithm on:
In the visualization the performance of the K-Nearest Neighbour algorithm is displayed. After running a 5-fold cross validation, the accuracy was 81%. The algorithm recognized that a song was on the summer playlist correctly 77 times, and correctly recognized the winter songs 85 times. In total the algorithm recognized a track wrong 38 times out of 200. This means that based on these two playlists, it is actually possible to make a accurate distinction between summer and winter tracks, based on the three chosen features.
Conclusion and Discussion
The first analysis showed that there is a significant difference between the ‘happiness’ of songs that are considered to be summer songs, and songs that are considered to be winter songs. Summer songs have both a higher valance and higher energy, which are reasonable features to describe a songs happiness. In future research, it could be tested whether other features that could mean happiness, such as loudness or danceability, have the same findings.
Looking at the acousticness, it was seen that winter songs have an almost uniform acousticness distribution, whereas summer tracks almost exclusively have a low acousticness, as was hypothesized.
There is a small difference in the distribution of tempo in winter and summer songs, where summer songs have a slightly higher average tempo but a less uniform distribution. The difference in keys was unsignificant.
The keygram showed the choruses quite clearly, but there wasn’t much other information in there that was accurately predicted. The tempogram very clearly showed the number of beats per minute throughout the whole song however.
The spotify timbre coefficients showed a few very distinct differences between the two playlists for multiple coefficients. For further research it would be advised to dive into this deeper and find out what exactly those features are.
From the classifier it can be concluded that acousticness, valence and energy are 3 features that can help to accurately classify a track as either a summer track or a winter track. This research was only based on the two official spotify tracks however, in further research it would be recommended to test the model on other playlists and research if it could classify songs from outside these spotify playlist correctly aswell.
In general it can be very useful to have a deeper understanding of what the characteristics are of songs that are popular during certain times of the year. Artists for example can consider using the knowledge.
Thank you for the course! I wasn’t able to follow it the way I would have liked to do, but it was still a fun project and the course was well-taught.
All extra documentation such as the Machine Learning code can be found on the github repository.
Kind regards
Greg Kot. (2014). What makes the ‘song of summer’?, published on bbc.com, https://www.bbc.com/culture/article/20140620-what-makes-the-song-of-summer
Park, M., Thom, J., Mennicken, S. et al. Global music streaming data reveal diurnal and seasonal patterns of affective preference. Nat Hum Behav 3, 230–236 (2019). https://doi.org/10.1038/s41562-018-0508-z
Raymond W Lam, Edwin M Tam, Lakshmi N Yatham, I-Shin Shiah, Athanasios P Zis. Seasonal depression: The dual vulnerability hypothesis revisited. Journal of Affective Disorders, Volume 63, Issues 1–3, 2001, Pages 123-132, https://doi.org/10.1016/S0165-0327(00)00196-8.